Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Large-scale deep learning workloads increasingly suffer from I/O bottlenecks as datasets grow beyond local storage capacities and GPU compute outpaces network and disk latencies. While recent systems optimize data-loading time, they overlook the energy cost of I/O—a critical factor at large scale. We introduce EMLIO, an Efficient Machine Learning I/O service that jointly minimizes end-to-end data-loading latency (𝑇) and I/O energy consumption (𝐸) across variable-latency networked storage. EMLIO deploys a lightweight data-serving daemon on storage nodes that serializes and batches raw samples, streams them over TCP with out-of-order prefetching, and integrates seamlessly with GPU-accelerated (NVIDIA DALI) pre-processing on the client side. In exhaustive evaluations over local disk, LAN (0.05 ms & 10 ms round trip time (RTT)), and WAN (30 ms RTT) environments, EMLIO delivers on average up to 8.6X faster I/O and 10.9X lower energy use compared to state-of-the-art loaders, while maintaining constant performance and energy profiles irrespective of network distance. EMLIO’s service-based architecture offers a scalable blueprint for energy-aware I/O in next-generation AI clouds.more » « lessFree, publicly-accessible full text available November 15, 2026
-
Software applications and workloads, especially within the domains of Cloud computing and large-scale AI model training, exert considerable demand on computing resources, thus contributing significantly to the overall energy footprint of the IT industry. In this paper, we present an in-depth analysis of certain software coding practices that can play a substantial role in increasing the application’s overall energy consumption, primarily stemming from the suboptimal utilization of computing resources. Our study encompasses a thorough investigation of 16 distinct code smells and other coding malpractices across 31 real-world open-source applications written in Java and Python. Through our research, we provide compelling evidence that various common refactoring techniques, typically employed to rectify specific code smells, can unintentionally escalate the application’s energy consumption. We illustrate that a discerning and strategic approach to code smell refactoring can yield substantial energy savings. For selective refactorings, this yields a reduction of up to 13.1% of energy consumption and 5.1% of carbon emissions per workload on average. These findings underscore the potential of selective and intelligent refactoring to substantially increase energy efficiency of Cloud software systems.more » « less
-
Adaptive bitrate (ABR) algorithms play a critical role in video streaming by making optimal bitrate decisions in dynamically changing network conditions to provide a high quality of experience (QoE) for users. However, most existing ABRs suffer from limitations such as predefined rules and incorrect assumptions about streaming parameters. They often prioritize higher bitrates and ignore the corresponding energy footprint, resulting in increased energy consumption, especially for mobile device users. Additionally, most ABR algorithms do not consider perceived quality, leading to suboptimal user experience. This article proposes a novel ABR scheme called GreenABR+, which utilizes deep reinforcement learning to optimize energy consumption during video streaming while maintaining high user QoE. Unlike existing rule-based ABR algorithms, GreenABR+ makes no assumptions about video settings or the streaming environment. GreenABR+ model works on different video representation sets and can adapt to dynamically changing conditions in a wide range of network scenarios. Our experiments demonstrate that GreenABR+ outperforms state-of-the-art ABR algorithms by saving up to 57% in streaming energy consumption and 57% in data consumption while providing up to 25% more perceptual QoE due to up to 87% less rebuffering time and near-zero capacity violations. The generalization and dynamic adaptability make GreenABR+ a flexible solution for energy-efficient ABR optimization.more » « less
-
Advancements in mobile hardware and streaming technologies enable high-quality video streaming for mobile users, but this comes at a cost: a boost in power consumption. Despite detailed studies on power consumption during acquisition, existing studies fall short of considering recent technologies and, hence, of accurately capturing video playback power consumption. This paper presents a novel method to model mobile video playback power consumption. First, we identify the major components contributing to power consumption during video playback on mobile devices. Then, we develop models for each component to estimate their power consumption. Our experimental results show that our combined model estimates power consumption with 91% mean accuracy. Furthermore, our model maintains its high accuracy on an unseen device, achieving 88% mean accuracy despite the hardware and screen heterogeneity.more » « less
-
The increasing complexity of AI workloads, especially distributed Large Language Model (LLM) training, places significant strain on the networking infrastructure of parallel data centers and supercomputing systems. While Equal-Cost Multi-Path (ECMP) routing distributes traffic over parallel paths, hash collisions often lead to imbalanced network resource utilization and performance bottlenecks. This paper presents FlowTracer, a tool designed to analyze network path utilization and evaluate different routing strategies. Unlike tools that introduce additional traffic, FlowTracer aids in debugging network inefficiencies by passively monitoring and correlating user workload flows. As a result, FlowTracer does not interfere with ongoing data transfers, enabling analysis with minimal overhead, which is an important factor when debugging and fine-tuning routing schemes in production systems. FlowTracer can provide detailed insights into traffic distribution and can help identify the root causes of performance degradation, such as hash collisions. With FlowTracer’s flow-level insights, system operators can optimize routing, reduce congestion, and improve the performance of distributed AI workloads. We use a RoCEv2-enabled cluster with a leaf-spine network and 16 400-Gbps nodes to demonstrate how FlowTracer can be used to compare the flow imbalances of ECMP routing against a statically configured network. The example showcases a 30% reduction in imbalance, as measured by a new metric we introduce.more » « lessFree, publicly-accessible full text available June 8, 2026
-
The growing adoption of cloud, edge, and distributed computing, as well as the rise in the use of AI/ML workloads, have created a significant need to measure, monitor, and reduce the carbon emissions associated with these resource-intensive tasks. One significant but often overlooked source of emissions is data transfers over wide-area networks (WANs), primarily due to the challenges in accurately measuring the carbon footprint of end-to-end network paths. We introduce a novel mechanism to measure network carbon footprints and propose strategies for optimizing the scheduling of network-intensive tasks. We show that users can achieve significant carbon savings by shifting data transfer tasks across time and geographic regions based on local carbon intensity.more » « lessFree, publicly-accessible full text available March 1, 2026
-
Efficiently transferring data over long-distance, high-speed networks requires optimal utilization of available network bandwidth. One effective method to achieve this is through the use of parallel TCP streams. This approach allows applications to leverage network parallelism, thereby enhancing transfer throughput. However, determining the ideal number of parallel TCP streams can be challenging due to non-deterministic background traffic sharing the network, as well as non-stationary and partially observable network signals. We present a novel learning-based approach that utilizes deep reinforcement learning (DRL) to determine the optimal number of parallel TCP streams. Our DRL-based algorithm is designed to intelligently utilize available network bandwidth while adapting to different network conditions. Unlike rule-based heuristics, which lack generalization in unknown network scenarios, our DRL-based solution can dynamically adjust the parallel TCP stream numbers to optimize network bandwidth utilization without causing network congestion and ensuring fairness among competing transfers. We conducted extensive experiments to evaluate our DRL-based algorithm’s performance and compared it with several state-of-the-art online optimization algorithms. The results demonstrate that our algorithm can identify nearly optimal solutions 40% faster while achieving up to 15% higher throughput. Furthermore, we show that our solution can prevent network congestion and distribute the available network resources fairly among competing transfers, unlike a discriminatory algorithm.more » « less
-
Network Function Virtualization (NFV) platforms consume significant energy, introducing high operational costs in edge and data centers. This paper presents a novel framework called GreenNFV that optimizes resource usage for network function chains using deep reinforcement learning. GreenNFV optimizes resource parameters such as CPU sharing ratio, CPU frequency scaling, last-level cache (LLC) allocation, DMA buffer size, and packet batch size. GreenNFV learns the resource scheduling model from the benchmark experiments and takes Service Level Agreements (SLAs) into account to optimize resource usage models based on the different throughput and energy consumption requirements. Our evaluation shows that GreenNFV models achieve high transfer throughput and low energy consumption while satisfying various SLA constraints. Specifically, GreenNFV with Throughput SLA can achieve 4.4× higher throughput and 1.5× better energy efficiency over the baseline settings, whereas GreenNFV with Energy SLA can achieve 3× higher throughput while reducing energy consumption by 50%.more » « less
An official website of the United States government
